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(57) Abstract 

A branch prediction scheme predicts whether a computer insctruction will cause a branch to a non— sequential instruction. A prediction 
counter is selected by performing an exclusive or (XOR) operation between bits from an instruction address and a hybrid history. The 
hybrid history, in mm, is derived by concatenating bits from a global history register with bits from a local branch history table. The bits 
from the local branch history table are accessed by using bits from the instruction address. 
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METHODS AND APPARATUS FOR BRANCH PREDICTION 
USING HYBRID HISTORY WITH INDEX SHARING 

BACKGROUND OF THE INVENTION 

A. Field of the Invention 

The invention generally relates to computer architecture, and, more particularly, to 
branch prediction. \. z :\ 

B. Description of the Related Art 

Modern high performance computer processors typically employ pipelining to increase 
performance. "Pipelining" refers to a processing technique in which multiple sequential 
instructions are executed in an overlapping manner. A general description of pipelining 
can be found in "Computer Organization & Design" by David A. Patterson and John L. 
Hennessy (2d ed. 1988, pp. 436-516). 

Fig. 1 shows the timing of instruction processing in a conventional five-stage pipeline 
processor architecture. With such an architecture, the processor can simultaneously 
process different stages of up to five successive instructions. The five stages shown in 
Fig. 1 are: IF (instruction fetch), ID (instruction decode), EX (execute instruction), MEM 
(memory access), and WB (write back to register). 

For example, at clock cycle 1, the processor fetches instruction II . At clock cycle 2, 
the processor decodes instruction II and fetches instruction 12. In the same manner, the 
processor continues to process instructions as they are received; by clock cycle 5, the 
processor writes back the result of instruction II, accesses memory for instruction 12, 
executes instruction 13, decodes instruction 14, and fetches instruction 15. In contrast, a 
non-pipelined architecture would complete processing of an entire instruction (e.g., 
instruction II) before beginning to process the next instruction (e.g., instruction 12). 

When program flow is perfectly sequential, a pipelined architecture can achieve 
significant performance advantages over non-pipelined architecture. In actual programs, 
however, approximately twenty percent of program instructions are branches. Branch 
instructions cause a program to deviate from a sequential flow. Consequently, the 
instruction to be executed (the target of the branch) may not be the next instruction in the 
fetch sequence. 
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A processor may recognize that an instruction is a branch instruction in the IF stage 
(the first stage of the five-stage pipeline). For Conditional branch instructions, however, 
the processor typically cannot determine whether the branch should be taken until it 
reaches the EX stage (the third stage of the five-stage pipeline). By this time, the 
processor has already fetched and begun processing the next two instructions. The 
processing of those^twa instructions is wasted and inefficient if the branch instruction 
redirects program flow to another location. 

Referring to Fig. 1, if instruction II is a conditional branch instruction that redirects 
flow to instruction 16, the processor does not recognize this until clock cycle 3 (EX), 
when the processor is executing instruction II. By this time, the processor has already 
fetched instruction 12 during clock cycle 2, and decoded instruction 12 and fetched 
instruction 13 during clock cycle 3. This processing of instructions 12 and 13 is wasted, 
however, because branch instruction II causes flow to skip to instruction 16, with no 
further processing of instructions 12 or 13. Moreover, the branching causes a stall in the 
pipeline while the correct instruction (16) is fetched. These inefficiencies caused by 
branches become exacerbated when deeper pipelines or superscalar processors are used 
because it takes longer to resolve a branch. 

One approach to solving this problem, called branch prediction, involves making 
accurate, educated determinations about whether an instruction will result in a branch to 
another location. Branch prediction is premised on the assumption that, under similar 
circumstances, the outcome of a conditional branch will likely be the same as prior 
outcomes. Because branch prediction can be implemented in the IF stage of processing, 
there is no wasted instruction processing if the result of the conditional branch is always 
predicted correctly. 

Conventional branch prediction techniques include correlation-based schemes and 
global branch history with index sharing ("gshare"). Although these techniques are 
somewhat effective, the frequency of erroneous prediction using thesS techniques may be 
unacceptable. There remains, therefore, a need for a branch prediction scheme that 
reduces the frequency of erroneous prediction. 
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SUMMARY OF THE INVENTION 
In accordance with theinvention, as embodied and broadly described herein, a method 
of predicting whether a branch will be taken involves reading bits from a local history table 
and concatenating them with bits from a global history register. The result of the 
concatenation is combined with bits from the instruction address by performing an 
exclusive or operation/. The result of the exclusive or operation is used to read a branch 
prediction table. 

In accordance with the invention, an apparatus for predicting whether a branch will 
be taken comprises a local history table and a global history register. The local history 
table and the global history table are connected to inputs of a concatenating circuit. The 
output of the concatenating circuit is connected to one input of an exclusive or circuit, 
with an instruction address source being connected to another input. The output of the 
exclusive or circuit is connected to an input of a branch prediction table. 

It is to be understood that both the foregoing general description and following 
detailed description are intended only to exemplify and explain the invention as claimed. 
BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and constitute a part of this 
specification, illustrate embodiments of the invention and, together with the description, 
serve to explain the advantages and principles of the invention. In the drawings, 

FIG. 1 shows the timing of instruction processing in a conventional five-stage pipeline 
processor architecture; 

FIG. 2 is a block diagram depicting a first system for branch prediction, consistent 
with the invention; 

FIG. 3 is a block diagram depicting a second system for branch prediction, consistent 
with the invention; 

FIG. 4 is a flow diagram of steps performed to predict whether a branch vtall be taken, 
consistent with the invention; and 

FIG. 5 is another flow diagram of steps performed to predict whether a branch will be 
taken, consistent with the invention. 
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DETAILED DESCRIPTION 
The invention will now be described in reference to the accompanying drawings. The 
same reference numbers may be used throughout the drawings and the following 
description to refer to the same or like parts. 

A. Overview 

Methods and apparatus consistent with the invention predict whether an instruction 
will cause a branch to a non-sequential instruction. This is achieved by incorporating 
features of the correlation-based and gshare schemes to obtain a scheme consistent with 
the invention. In particular, a prediction counter is selected by performing an exclusive 
or (XOR) operation between (i) a specified number ofbits from an instruction address and 
(ii) a hybrid history. The hybrid history, in turn, is derived by concatenating (i) a specified 
number ofbits from a global history register with (ii) a specified number ofbits from a 
local branch history table. The bits from the local branch history table are accessed by 
using a specified number ofbits from the instruction address. 

B. Architecture 

Fig. 2 is a block diagram depicting one system of branch prediction, consistent with 
the invention. In a preferred embodiment, system 200 includes the following components: 
local history table 220, concatenator 230, global history register 240, XOR 250, and 
branch prediction table 260. System 200 may be controlled and accessed by an instruction 
fetch unit ("IFU") 290. 

Local history table 220 is connected to concatenator 230 via a data path that is 
preferably / bits wide. Global history register 240 is also connected to concatenator 230 
via a data path that is preferably g bits wide. Concatenator 230 is connected to XOR 250 
via a data path that is preferably l+g bits wide. XOR 250 is connected to branch 
prediction table 260 via a data path that is preferably l+g bits wide. 

Local history table 220 is a device storing local history data and preferably comprises 
2 a shift registers, each having at least / bits. Alternatively, local Mstory data may be 
stored in any type of memory, such as a single register, multiple registers, or random 
access memory. Each register stores the / most recent conditional outcomes for a set of 
instruction addresses that each have the same a address bits in common. When an 



WO 00/43869 PCT/USOO/01500 

5 

instruction in this set results in a branch being taken, a value of 1 is shifted into the 
corresponding register. In contrast, a value pf Jfr is shifted into the corresponding register 
if a branch is not taken. Data that corresponds to branch history on a local level is 
hereinafter called "local branch history data." 

Global history register 240 preferably comprises a shift register having at least g bits. 
These bits represent tZ&Fg most recent outcomes for any branch instruction, conditional 
or not, and regardless of its address. When a branch is taken, a value of 1 is shifted into 
global history register 240. In contrast, a value of 0 is shifted into global history register 
240 when a branch is not taken. Data that corresponds to branch history at a global level 
is hereinafter called "global branch history data." 

Concatenator 230 is a device that receives g bits from global history register 240 and 
/ bits from local history table 220, and concatenates them together to form an output 
having l+g bits. XOR 250 is a device that receives two inputs each having l+g bits, 
performs an exclusive or (XOR) operation between the two inputs on a bit-by-bit basis, 
and creates an output having l+g bits. 

Branch prediction table 260 is a device storing branch prediction data and may be 
implemented using a plurality of w-bit saturating counters. Each of these counters stores 
data representing whether a branch was taken under a particular circumstance. A 
circumstance may be defined by the input to branch prediction table 260, which in system 
200 may be based on the values of the instruction address, the global history register, and 
the local history register. For a particular circumstance, if a branch is taken, the value of 
the corresponding counter is incremented; if the counter value is already at its maximum 
value, it remains there. If, on the other hand, a branch is not taken, the value of the 
counter corresponding to that circumstance is decremented; if the counter value is already 
at its minimum value, it remains unchanged. 

These n-bit saturating counters are the basis for the branch prediction decision. For 
a particular circumstance, if a branch was previously taken (indicated by the counter 
having a designated value), system 200 predicts that the branch will be taken again. In a 
preferred embodiment, system 200 predicts that the branch will be taken if the most 
significant bit of the n-bit counter equals "1". 



WO 00/43869 PCT/US00/01 500 

6 

In one embodiment, a= 14, f=4, g=12, and n=2. Other values for these variables can 
also be used. For example;jin another embQdiment, a=14, £=2, g=14, and /z=2. 

Fig. 3 is block diagram depicting another system of branch prediction, consistent with 
the present invention. In a preferred embodiment, system 300 includes the following 
components: local history table 220, global history register 240, branch prediction table 
260, XOR 310, XOR r i20, and column decode multiplexer 330. System 300 may be 
controlled and accessed by BFU 290. 

Global history register 240 is connected to XOR 3 1 0 via a data path that is preferably 
g bits wide. XOR 3 1 0 is connected to branch prediction table 260 via a data path that is 
preferably g bits wide. Branch prediction table 260 is connected to multiplexer 330 via 
2 l data paths that are preferably n bits wide. The select line of multiplexer 330 is 
connected to XOR 320 via a data path that is / bits wide. XOR 320 is connected to local 
history table 320 via a data path that is / bits wide. 

XOR 3 1 0 is a device that receives two inputs that are g bits wide, performs an XOR 
operation on them on a bit-by-bit basis, and generates an output that is also g bits wide. 
Similarly, XOR 320 is a device that receives two inputs that are / bits wide, performs an 
XOR operation on them on a bit-by-bit basis, and generates an output that is also / bits 
wide. Multiplexer 330 receives 2 l inputs that are /?-bits wide. In response to a control 
signal from XOR 320, multiplexer 330 passes along one of the 2 l inputs. The system 
shown in Fig. 3 may use the same values for variables a, l t g, and n as stated above in 
reference to Fig. 2. 

Systems 200 and 300 described in reference to Figs. 2 and 3 may be implemented as 
an integrated circuit as part of one or more computer processors. Alternatively, systems 
200 and 300 may be implemented in discrete logic components or software and may be 
implemented separate from a processor. 

C. Architectural Operation - ' 

Fig. 4 is a flow diagram of a process consistent with the invention, and is described 

with reference to system 200 shown in Fig. 2. 

The process begins with DFU 290 reading local history table 220 (step 410). 

Specifically, IFU uses bits a+l:2 of the instruction address to access local history data 
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from local history table 220. As used herein, the terminology "m.w" denotes bits m 
through n, inclusive. In a preferred embodiment, these bits correspond to the a least 
significant bits of the instruction address excluding the last two bits. The last two bits are 
preferably excluded because they are typically zero in a processor that employs byte 
•addressing and 32-bit instructions. By accessing local history table 220, IFU 290 causes 
it to generate an output-that is at least / bits wide. 

Those of skill in the art will recognize that the invention is not limited to using a 
portion of the instruction address to access the local history table. For example, the local 
history table could instead be accessed based upon an address that corresponds to a group 
of instructions. 

Concatenator 230 concatenates the /-bit output from local history table 220 with g bits 
from global history register 240 (step 420). The output of concatenator 230 may be 
referred to as either concatenated history data or a hybrid history. XOR 250 performs an 
XOR operation between the l+g bits output by concatenator 230 and l+g+l:2 bits from 
the instruction address (step 430). These l+g+l:2 bits correspond to the l+g least 
significant bits of the instruction address, preferably excluding the last two bits. 

IFU 290 uses the l+g bits resulting from the XOR operation to read branch prediction 
table 260 (step 440). In response, branch prediction table 260 generates an output that 
is n bits wide. IFU 290 then interprets this /2-bit output to predict whether a branch will 
occur (step 450). Specifically, if the w-bit counter indicates that a branch was taken 
previously under similar circumstances, then a prediction is made that the branch will 
again be taken. 

Fig. 5 is a flow diagram of another process consistent with the invention. The process 
shown in Fig. 5 is described with reference to system 300, shown in Fig. 3. 

The process begins with XOR 3 1 0 performing an XOR operation between g bits from 
global history register 240 and l+g-3:l+2 bits from the instruction address'(step 510). 
These l+g-3:l+2 bits correspond to the same portion of the instruction address that is 
used in the XOR operation with the global history portion of the hybrid history in FIG. 
2. IFU 290 uses this g-bit output from exclusive or 310 as an input to read branch 
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prediction table 260 (step 520). In response, branch prediction table 260 generates 2 l 
outputs that are each /r-bks wide. These outputs are fed as inputs to multiplexer 330. 

Concurrent with the read of branch prediction table 260, the system performs a read 
of local history table 220 (step 530). Specifically, IFU 290 reads local history table 220 
via bits a+l:2 of the, instruction address; the last two bits are preferably excluded as 
discussed above in connection with the process shown in Fig. 3 . In response to this read 
operation, local history table 220 generates an output that is / bits wide. XOR 320 
performs an XOR operation between this /-bit output and bits /+7;2 from the instruction 
address (step 540). Again, the last two bits are preferably excluded. This creates an 
output from exclusive or 320 that is /-bits wide. 

IFU 290 uses the /-bit output from exclusive or 320 as a "select" input to multiplexer 
330. In response, multiplexer 330 generates an w-bit output equivalent to one of the 2 l 
outputs of branch prediction table 260 (step 550). IFU 290 then interprets this /i-bit 
output to predict whether or not a branch will occur (step 560). Specifically, if the most 
significant bit of /7-bit counter indicates that a branch was taken previously under similar 
circumstances, then a prediction is made that the branch will again be taken. In a 
preferred embodiment, if the most significant bit of rc-bit counter equals "1", then a 
prediction is made that the branch will again be taken. 
D. Conclusion 

As described in detail above, methods and apparatus consistent with the present 
invention predict whether a branch will be taken. The foregoing description of an 
implementation of the invention has been presented for purposes of illustration and 
description. It is not exhaustive and does not limit the invention to the precise form 
disclosed. Modifications and variations are possible in light of the above teachings or may 
be acquired from practicing the invention. For example, the data paths between the 
various components may be in the form of integrated circuit connections, wires, or fiber 
optics, to name a few. Similarly, although the description above is based on a processor 
that employs byte addressing and 32-bit instructions, a similar approach could be 
employed with other addressing schemes. Moreover, the description above is based on 
a single-processor pipeline architecture, but the invention may be used in a multiple 
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processor environment aind non-pipeline processor environments. Furthermore, although 
the description above employs an XOR function, other hashing functions could be used 
consistent with the invention. The scope of the invention is defined by the claims and their 
equivalents. 
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What is claimed is: . 

1 . A method of predicting whether processing of an instruction is to result in branching 
of program flow, comprising: 

accessing local branch history data; 
accessing global branch history data; 

concatenating tJie^accessed local branch history data with the accessed global branch 
history data to form concatenated history data; 

performing an exclusive or operation between the concatenated history data and bits 
from the address of the instruction to form an index to branch prediction data; and 

generating branch prediction data based upon the index. 

2. The method of claim 1, further comprising interpreting the branch prediction data to 
determine whether a branch will be taken. 

3. The method of claim 1, further comprising updating the global branch history data 
based upon whether a branch is taken. 

4. The method of claim 1, further comprising updating the local branch history data 
based upon whether a branch is taken. 

5. A method of predicting whether processing of an instruction is to result in branching 
of program flow, comprising: 

accessing global branch history data; 

performing a first exclusive or operation between the accessed global branch history 
data and bits from the address of the instruction to form index data; 
accessing local branch history data; 

performing a second exclusive or operation between the accessed local branch history 
data and bits from the address of the instruction to form select data; 
accessing branch prediction data based upon the index data; and 
selecting at least one of the branch prediction data based upon the select data. 

6. A method of predicting whether processing of an instruction is to result in branching 
of program flow, comprising: 

reading local branch history data from a local history table; 
reading global branch history data from a global history register; 
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concatenating the read local branch history data with the read global branch history 
data to form concatenated history data;. . 

performing an exclusive or operation between the concatenated history data and bits 
from the address of the instruction to form an index to branch prediction data; and 

reading branch prediction data from a branch prediction table based upon the index. 

7. A method of predicting whether processing of an instruction is to result in branching 
of program flow, comprising: 

reading global branch history data from a global history register; 
performing a first exclusive or operation between the read global branch history data 
and bits from the address of the instruction to form index data; 
reading local branch history data from a local history table; 

performing a second exclusive or operation between the read local branch history data 
and bits from the address of the instruction to form select data; 

reading branch prediction data from a branch prediction table based upon the index 
data; and 

selecting at least one of the branch prediction data based upon the select data. 

8. A method of predicting whether processing of an instruction is to result in branching 
of program flow, comprising: 

providing a portion of an instruction address as an input to a local history table; 
providing an output of a local history table as a first input to a concatenating circuit; 
providing an output of a global history register as a second input to the concatenating 
circuit; 

providing an output from the concatenating circuit as a first input to an exclusive or 
circuit; 

providing a portion of the instruction address as a second input to the exclusive or 
circuit; and - ' 

providing an output from the exclusive or circuit as an input to a branch prediction 
table. 

9. A method of predicting whether processing of an instruction is to result in branching 
of program flow, comprising: 
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providing an output of a global history register as a first input to a first exclusive or 
circuit; - 

providing a portion of an instruction address as a second input to the first exclusive 
or circuit; 

providing an output from the first exclusive or circuit as an input to a branch 
prediction table; V'~r 

providing a portion of an instruction address as an input to a local history table; 
providing an output of the local history table as a first input to a second exclusive or 
circuit; 

providing a portion of an instruction address as a second input to a second exclusive 
or circuit; and 

providing an output from the second exclusive or circuit as an input to a select circuit. 

1 0. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

means for accessing local branch history data; 
means for accessing global branch history data; 

means for concatenating the accessed local branch history data with the accessed 
global branch history data to form concatenated history data; 

means for performing an exclusive or operation between the concatenated history data 
and bits from the address of the instruction to form an index to branch prediction data; and 

means for generating branch prediction data based upon the index. 

11. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

means for accessing global branch history data; 

means for performing a first exclusive or operation between the accessed global 
branch history data and bits from the address of the instruction to form index data; 
means for accessing local branch history data; 

means for performing a second exclusive or operation between the accessed local 
branch history data and bits from the address of the instruction to form select data; 
means for accessing branch prediction data based upon the index data; and 
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means for selecting at least one of the branch prediction data based upon the select 
data. " _ 'x-: 

12. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

means for reading local branch history data from a local history table; 

means for reading global branch history data from a global history register; 

means for concatenating the read local branch history data with the read global branch 
history data to form concatenated history data; 

means for performing an exclusive or operation between the concatenated history data 
and bits from the address of the instruction to form an index to branch prediction data; and 

means for reading branch prediction data from a branch prediction table based upon 
the index. 

13. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

means for reading global branch history data from a global history register; 

means for performing a first exclusive or operation between the read global branch 
history data and bits from the address of the instruction to form index data; 

means for reading local branch history data from a local history table; 

means for performing a second exclusive or operation between the read local branch 
history data and bits from the address of the instruction to form select data; 

means for reading branch prediction data from a branch prediction table based upon 
the index data; and 

means for selecting at least one of the branch prediction data based upon the select 
data. 

14. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: * ' 

means for providing a portion of an instruction address as an input to a local history 
table; 

means for providing an output of a local history table as a first input to a 
concatenating circuit; 
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means for providing an output of a global history register as a second input to the 
concatenating circuit; " ; . > 

means for providing an output from the concatenating circuit as a first input to an 
exclusive or circuit; 

means for providing a portion of an instruction address as a second input to the 
exclusive or circuit; ?pcK 

means for providing an output from the exclusive or circuit as an input to a branch 
prediction table. 

15. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

means for providing an output of a global history register as a first input to a first 
exclusive or circuit; 

means for providing a portion of an instruction address as a second input to the first 
exclusive or circuit; 

means for providing an output from the first exclusive or circuit as an input to a 
branch prediction table; 

means for providing a portion of an instruction address as an input to a local history 
table; 

means for providing an output of the local history table as a first input to a second 
exclusive or circuit; 

means for providing a portion of an instruction address as a second input to a second 
exclusive or circuit; and 

means for providing an output from the second exclusive or circuit as an input to a 
select circuit. 

1 6. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: % ' 

a first memory storing local branch history data; 
a second memory storing global branch history data; 
a third memory storing branch prediction data; 
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a concatenating device having first and second inputs connected to the first memory 
and the second memory, respectively, and an output; and 

a XOR device having a first input connected to the output of the concatenating device, 
a second input receiving at least a portion of an address of the instruction, and an output 
connected to the third memory. 

17. An appanatu^for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

a local branch history table; 
a global branch history register; 
a branch prediction table; 

a concatenating device having first and second inputs connected to the local branch 
history table and the global branch history register, respectively, and an output; and 

a XOR device having a first input connected to the output of the concatenating device, 
a second input receiving at least a portion of an address of the instruction, and an output 
connected to the branch prediction table. 

18. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

a first memory storing local branch history data; 
a second memory storing global branch history data; 
a third memory storing branch prediction data; 

a first XOR device having a first input connected to the second memory, a second 
input receiving at least a portion of an address of the instruction, and an output connected 
to the third memory; and 

a second XOR device having a first input connected to the first memory, a second 
input receiving at least a portion of an address of the instruction, and an output connected 
to a multiplexer. - ' 

19. An apparatus for predicting whether processing of an instruction is to result 
in branching of program flow, comprising: 

a local branch history table; 
a global branch history register; 
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a branch prediction table; 

a first XOR device having a first input connected to the global branch history register, 
a second input receiving at least a portion of an address of the instruction, and an output 
connected to the branch prediction table; and 

a second XOR device having a first input connected to the local branch history table, 
a second input receiving at least a portion of an address of the instruction, and an output 
connected to a select line of a multiplexer. 

20. A system that predicts whether processing of an instruction is to result in 
branching of program flow, comprising: 

a processor for executing instructions; 

a first memory storing local branch history data and connected to the processor; 
a second memory storing global branch history data; 

a third memory storing branch prediction data and connected to the processor; 

a concatenating device having first and second inputs connected to the first memory 
and the second memory, respectively, and an output; and 

a XOR device having a first input connected to the output of the concatenating device, 
a second input receiving at least a portion of an address of the instruction, and an output 
connected to the third memory. 

21. The system according to claim 20, wherein the processor is configured to 
execute instructions in a pipeline. 

22. The system according to claim 21, wherein the processor comprises: 
an instruction fetch unit for fetching instructions; 

an instruction decode unit for decoding fetched instructions; 
an execution unit for executing the decoded instructions; 
a memory access unit for accessing data from a memory; and 
a write back unit to write data to a memory. 

23. A system that predicts whether processing of an instruction is to result in 
branching of program flow, comprising: 

an instruction fetch unit for fetching instructions to be processed; 
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a first memory storing local branch history data and connected to the instruction fetch 
unit; 

a second memory storing global branch history data; 

a third memory storing branch prediction data and connected to the instruction fetch 
unit; 

a concatenating device having first and second inputs connected to the first memory 
and the second memory, respectively, and an output; and 

a XOR device having a first input connected to the output of the concatenating device, 
a second input receiving at least a portion of an address of the instruction, and an output 
connected to the third memory. 
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